A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription
Identifieur interne : 001667 ( Main/Exploration ); précédent : 001666; suivant : 001668A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription
Auteurs : Denis Jouvet [France] ; David Langlois [France]Source :
- Lecture Notes in Computer Science [ 0302-9743 ]
Abstract
Abstract: This paper introduces a new approach based on neural networks for selecting the vocabulary to be used in a speech transcription system. Indeed, nowadays, large sets of text data can be collected from web sources, and used in addition to more traditional text sources for building language models for speech transcription systems. However, web data sources lead to large amounts of heterogeneous data, and, as a consequence, standard vocabulary selection procedures based on unigram approaches tend to select unwanted and undesirable items as new words. As an alternative to unigram-based and empirical manual-based selection approaches, this paper proposes a new selection procedure that relies on a machine learning technique, namely neural networks. The paper presents and discusses the results obtained with the various selection procedures. The neural network based selection experiments are promising and they can handle automatically various detailed information in the selection process.
Url:
DOI: 10.1007/978-3-642-40585-3_9
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 002850
- to stream Istex, to step Curation: 002816
- to stream Istex, to step Checkpoint: 000293
- to stream Main, to step Merge: 001679
- to stream Main, to step Curation: 001667
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription</title>
<author><name sortKey="Jouvet, Denis" sort="Jouvet, Denis" uniqKey="Jouvet D" first="Denis" last="Jouvet">Denis Jouvet</name>
</author>
<author><name sortKey="Langlois, David" sort="Langlois, David" uniqKey="Langlois D" first="David" last="Langlois">David Langlois</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:AAD8BF6B01D32A09E76BE939D42E64ED5C820A3E</idno>
<date when="2013" year="2013">2013</date>
<idno type="doi">10.1007/978-3-642-40585-3_9</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-G3FK94MV-V/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002850</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">002850</idno>
<idno type="wicri:Area/Istex/Curation">002816</idno>
<idno type="wicri:Area/Istex/Checkpoint">000293</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000293</idno>
<idno type="wicri:doubleKey">0302-9743:2013:Jouvet D:a:machine:learning</idno>
<idno type="wicri:Area/Main/Merge">001679</idno>
<idno type="wicri:Area/Main/Curation">001667</idno>
<idno type="wicri:Area/Main/Exploration">001667</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription</title>
<author><name sortKey="Jouvet, Denis" sort="Jouvet, Denis" uniqKey="Jouvet D" first="Denis" last="Jouvet">Denis Jouvet</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>Speech Group, LORIA Inria, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>Université de Lorraine, LORIA, UMR 7503, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>CNRS, LORIA, UMR 7503, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Langlois, David" sort="Langlois, David" uniqKey="Langlois D" first="David" last="Langlois">David Langlois</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>Speech Group, LORIA Inria, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>Université de Lorraine, LORIA, UMR 7503, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>CNRS, LORIA, UMR 7503, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper introduces a new approach based on neural networks for selecting the vocabulary to be used in a speech transcription system. Indeed, nowadays, large sets of text data can be collected from web sources, and used in addition to more traditional text sources for building language models for speech transcription systems. However, web data sources lead to large amounts of heterogeneous data, and, as a consequence, standard vocabulary selection procedures based on unigram approaches tend to select unwanted and undesirable items as new words. As an alternative to unigram-based and empirical manual-based selection approaches, this paper proposes a new selection procedure that relies on a machine learning technique, namely neural networks. The paper presents and discusses the results obtained with the various selection procedures. The neural network based selection experiments are promising and they can handle automatically various detailed information in the selection process.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement><li>Villers-lès-Nancy</li>
</settlement>
<orgName><li>Université de Lorraine</li>
</orgName>
</list>
<tree><country name="France"><region name="Grand Est"><name sortKey="Jouvet, Denis" sort="Jouvet, Denis" uniqKey="Jouvet D" first="Denis" last="Jouvet">Denis Jouvet</name>
</region>
<name sortKey="Jouvet, Denis" sort="Jouvet, Denis" uniqKey="Jouvet D" first="Denis" last="Jouvet">Denis Jouvet</name>
<name sortKey="Jouvet, Denis" sort="Jouvet, Denis" uniqKey="Jouvet D" first="Denis" last="Jouvet">Denis Jouvet</name>
<name sortKey="Langlois, David" sort="Langlois, David" uniqKey="Langlois D" first="David" last="Langlois">David Langlois</name>
<name sortKey="Langlois, David" sort="Langlois, David" uniqKey="Langlois D" first="David" last="Langlois">David Langlois</name>
<name sortKey="Langlois, David" sort="Langlois, David" uniqKey="Langlois D" first="David" last="Langlois">David Langlois</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001667 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001667 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:AAD8BF6B01D32A09E76BE939D42E64ED5C820A3E |texte= A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription }}
This area was generated with Dilib version V0.6.33. |